Finite Sample Corrections for Parameters Estimation and Significance Testing

نویسندگان

  • Boon Kin Teh
  • Darrell JiaJie Tay
  • Sai Ping Li
  • Siew Ann Cheong
چکیده

An increasingly important problem in the era of Big Data is fitting data to distributions. However, many stop at visually inspecting the fits or use the coefficient of determination as a measure of the goodness of fit. In general, goodness-of-fit measures do not allow us to tell which of several distributions fit the data best. Also, the likelihood of drawing the data from a distribution can be low even when the fit is good. To overcome these limitations, Clauset et al. advocated a three-step procedure for fitting any distribution: (i) estimate parameter(s) accurately, (ii) choosing and calculating an appropriate goodness of fit, (iii) test its significance to determine how likely this goodness of fit will appear in samples of the distribution. When we perform this significance testing on exponential distributions, we often obtain low significance values despite the fits being visually good. This led to our realization that most fitting methods do not account for effects due to the finite number of elements and the finite largest element. The former produces sample size dependence in the goodness of fits and the latter introduces a bias in the estimated parameter and the goodness of fit. We propose modifications to account for both and show that these corrections improve the significance of the fits of both real and simulated data. In addition, we used simulations and analytical approximations to verify that convergence rate of the estimated parameters toward its true value depends on how fast the largest element converge to infinity, and provide fast inversion formulas to obtain p-values directly from the adjusted test statistics, in place of doing more Monte Carlo simulations.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finite Sample Size Optimality of GLR Tests

In binary hypothesis testing, when the hypotheses are composite or the corresponding data pdfs contain unknown parameters, one can use the well known generalized likelihood ratio test (GLRT) to reach a decision. This test has the very desirable characteristic of performing simultaneous detection and estimation in the case of parameterized pdfs or combined detection and isolation in the case of ...

متن کامل

An Improvement on the Estimation of River ECs using ANN Models and ANFIS involving PCA Analysis, Case Study; Nekarood River, IRAN

Estimation of changes in water quality parameters including electrical conductivity along a river is essential. In this paper, ANN and ANFIS-SC were used to estimate the ECs of the Nekarood River, North Iran, from 1992-2013. The study period was divided into two periods of dry and wet, based on the river flow rate. Then, Using the PCA, the effective parameters in EC estimation were determined...

متن کامل

Determining the Sample size for Estimation of the CCC-R Control Chart Parameters Based on Estimation Costs

In today's highly competitive industrial environment due to fast technology development, quality practitioners will to detect out-of-control situations and take actions whenever is necessary as soon as possible. Accordingly, new statistical procedures have been enhanced incessantly both to handle high yield processes along with looking for methods of minimizing all quality cost. CCC-r chart, th...

متن کامل

BASIC STATISTICS FOR BUSY CLINICIANS ( V ) Statistical inference : Hypothesis testing

The aim of statistical inference is to predict the parameters of a population, based on a sample of data. Inferential statistics encompasses the estimation of parameters and model predictions. The present article describes the hypothesis tests or statistical significance tests most commonly used in healthcare research. & 2010 SEICAP. Published by Elsevier España, S.L. All rights reserved.

متن کامل

Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach

The false discovery rate (FDR) is a multiple hypothesis testing quantity that describes the expected proportion of false positive results among all rejected null hypotheses. Benjamini and Hochberg introduced this quantity and proved that a particular step-up p-value method controls the FDR. Storey introduced a point estimate of the FDR for fixed significance regions. The former approach conserv...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018